A common concern when a policy-maker draws causal inferences and makes decisions from observational data is that the measured covariates are insufficiently rich to account for all sources of confounding, i.e., the standard no confoundedness assumption fails to hold. The recently proposed proximal causal inference framework shows that proxy variables can be leveraged to identify causal effects and therefore facilitate decision-making. Building upon this line of work, we propose a novel optimal individualized treatment regime based on so-called outcome-inducing and treatment-inducing confounding bridges. We then show that the value function of this new optimal treatment regime is superior to that of existing ones in the literature. Theoretical guarantees, including identification, superiority, and excess value bound of the estimated regime, are established. Moreover, we demonstrate the proposed optimal regime via numerical experiments and a real data application.
translated by 谷歌翻译
准确的牙齿体积分割是计算机辅助牙齿分析的先决条件。基于深度学习的牙齿分割方法已经达到了令人满意的表现,但需要大量的牙齿数据。公开可用的牙科数据是有限的,这意味着无法在临床实践中复制,评估和应用现有方法。在本文中,我们建立了一个3D Dental CBCT数据集Ctooth+,具有22个完全注释的卷和146个未标记的体积。我们进一步评估了基于完全监督的学习,半监督学习和积极学习的几种最先进的牙齿量细分策略,并定义了绩效原则。这项工作为牙齿体积分割任务提供了新的基准,该实验可以作为未来基于AI的牙科成像研究和临床应用开发的基线。
translated by 谷歌翻译
近年来,Experts(MOE)的混合物已成为一种有前途的深度学习技术,可以将模型能力扩展为万亿多个参数,同时通过稀疏计算降低计算成本。虽然MoE开设了一个非常大的模型的新领域,但由于MOE的动态性质与系统的静态平行性/管道层之间的不匹配,因此其数以千计的GPU的实现受到限制。我们提出了Tutel,这是一种具有动态自适应并行性和管道的高度可扩展的堆栈设计和实现。 TUTEL在运行时提供自适应并行性切换和自适应管道,分别达到1.74倍和2.00倍的单MOE层加速度。我们还提出了一种用于MOE通信速度的新颖的二维层次结构算法,该算法的表现超过了2,048 GPU的先前最先前的最新时间。 Tutel汇总了所有技术,最终在16 GPU和2,048 GPU上分别提供了4.96倍和5.75倍的加速度,分别通过Fairseq:Meta的Facebook AI AI研究序列到序列工具Kit(Tutel(Tutel)(Tutel)(Tutel)(现在由Fairseq部分采用)。 Tutel源代码可在公共场所获得:https://github.com/microsoft/tutel。我们的评估表明,Tutel有效,有效地运行了一个基于现实的MOE模型,名为Swinv2-Moe,建立在Swin Transformer V2上,这是一种最先进的计算机视觉体系结构。在效率方面,Tutel加速了Swinv2-MoE,在FairSeq的训练和推理中分别达到1.55倍和2.11倍的速度。关于有效性,SWINV2-MOE模型在预训练和下游计算机视觉任务(例如可可对象检测)方面都比对应的密度密度模型都达到了卓越的精度,这表明Tutel准备对端到端现实世界模型训练的准备就绪和推理。 Swinv2-Moe在https://github.com/microsoft/swin-transformer中开放。
translated by 谷歌翻译
本文介绍了Pytsk,这是一种用于开发Takagi-Sugeno-Kang(TSK)模糊系统的Python工具箱。基于Scikit-Learn和Pytorch,PYTSK允许用户使用基于模糊的聚类或基于迷你批处理梯度下降(MBGD)算法优化TSK模糊系统。工具箱中实现了几种基于MBGD的最先进的优化算法,这可以改善TSK模糊系统的概括性能,尤其是对于大数据应用程序。PYTSK也可以轻松扩展和定制,以用于更复杂的算法,例如修改TSK模糊系统的结构,开发更复杂的训练算法,并将TSK模糊系统与神经网络相结合。可以在https://github.com/yuqicui/pytsk上找到PYTSK的代码。
translated by 谷歌翻译
深度加强学习(DRL)在解决许多应用中解决了序贯决策时的巨大潜力。尽管在现实世界的情景下部署DRL时,尽管具有很有希望的表现,但存在实际差距。一个主要障碍是过度拟合的问题,导致DRL学习的政策的普遍性差。特别是,对于具有观测数据的离线DRL,模型选择是一个具有挑战性的任务,因为没有用于仿真环境的绩效演示的地面真相,与模拟环境的在线设置相比。在这项工作中,我们提出了一种悲观的模型选择(PMS)方法,用于脱机DRL,具有理论上的保证,它具有可用于查找一组候选模型中的最佳政策的可怕有效框架。还提出了两种精致的方法来解决DRL模型在识别最佳政策方面的潜在偏见。数值研究表明了我们对现有方法方法的卓越性能。
translated by 谷歌翻译
基于森林的方法最近在非参数治疗效应估计中获得了普及。在这一工作方面,我们引入了因果生存森林,可用于在可能右估计结果的生存和观察环境中估计异质治疗效果。我们的方法依赖于正交估计方程来在不满意的情况下对审查和选择效果进行鲁棒性调整。在我们的实验中,我们发现相对于许多基线的表现良好的方法。
translated by 谷歌翻译
A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning.
translated by 谷歌翻译
When using LiDAR semantic segmentation models for safety-critical applications such as autonomous driving, it is essential to understand and improve their robustness with respect to a large range of LiDAR corruptions. In this paper, we aim to comprehensively analyze the robustness of LiDAR semantic segmentation models under various corruptions. To rigorously evaluate the robustness and generalizability of current approaches, we propose a new benchmark called SemanticKITTI-C, which features 16 out-of-domain LiDAR corruptions in three groups, namely adverse weather, measurement noise and cross-device discrepancy. Then, we systematically investigate 11 LiDAR semantic segmentation models, especially spanning different input representations (e.g., point clouds, voxels, projected images, and etc.), network architectures and training schemes. Through this study, we obtain two insights: 1) We find out that the input representation plays a crucial role in robustness. Specifically, under specific corruptions, different representations perform variously. 2) Although state-of-the-art methods on LiDAR semantic segmentation achieve promising results on clean data, they are less robust when dealing with noisy data. Finally, based on the above observations, we design a robust LiDAR segmentation model (RLSeg) which greatly boosts the robustness with simple but effective modifications. It is promising that our benchmark, comprehensive analysis, and observations can boost future research in robust LiDAR semantic segmentation for safety-critical applications.
translated by 谷歌翻译
Denoising Diffusion Probabilistic Models (DDPMs) are emerging in text-to-speech (TTS) synthesis because of their strong capability of generating high-fidelity samples. However, their iterative refinement process in high-dimensional data space results in slow inference speed, which restricts their application in real-time systems. Previous works have explored speeding up by minimizing the number of inference steps but at the cost of sample quality. In this work, to improve the inference speed for DDPM-based TTS model while achieving high sample quality, we propose ResGrad, a lightweight diffusion model which learns to refine the output spectrogram of an existing TTS model (e.g., FastSpeech 2) by predicting the residual between the model output and the corresponding ground-truth speech. ResGrad has several advantages: 1) Compare with other acceleration methods for DDPM which need to synthesize speech from scratch, ResGrad reduces the complexity of task by changing the generation target from ground-truth mel-spectrogram to the residual, resulting into a more lightweight model and thus a smaller real-time factor. 2) ResGrad is employed in the inference process of the existing TTS model in a plug-and-play way, without re-training this model. We verify ResGrad on the single-speaker dataset LJSpeech and two more challenging datasets with multiple speakers (LibriTTS) and high sampling rate (VCTK). Experimental results show that in comparison with other speed-up methods of DDPMs: 1) ResGrad achieves better sample quality with the same inference speed measured by real-time factor; 2) with similar speech quality, ResGrad synthesizes speech faster than baseline methods by more than 10 times. Audio samples are available at https://resgrad1.github.io/.
translated by 谷歌翻译
Crowdsourcing, in which human intelligence and productivity is dynamically mobilized to tackle tasks too complex for automation alone to handle, has grown to be an important research topic and inspired new businesses (e.g., Uber, Airbnb). Over the years, crowdsourcing has morphed from providing a platform where workers and tasks can be matched up manually into one which leverages data-driven algorithmic management approaches powered by artificial intelligence (AI) to achieve increasingly sophisticated optimization objectives. In this paper, we provide a survey presenting a unique systematic overview on how AI can empower crowdsourcing - which we refer to as AI-Empowered Crowdsourcing(AIEC). We propose a taxonomy which divides algorithmic crowdsourcing into three major areas: 1) task delegation, 2) motivating workers, and 3) quality control, focusing on the major objectives which need to be accomplished. We discuss the limitations and insights, and curate the challenges of doing research in each of these areas to highlight promising future research directions.
translated by 谷歌翻译